Segment selection in the L&h Realspeak laboratory TTS system
نویسندگان
چکیده
The L&H RealSpeak Laboratory TTS (RSLab) system is a corpus based speech synthesis system comprising components that deal with linguistic processing, prosody prediction, segment selection, concatenation and modification. In this paper we focus on the segment selection process. During segment selection, the units in a large database of speech are scored with a cost according to their prosodic/phonetic mismatch with the target description of the utterance to be synthesized. This prosodic/phonetic cost is computed on the basis of a combination of symbolic and numeric features. The candidate units from the speech database are then evaluated for the ease with which they can be concatenated. A dynamic programming algorithm, using additive costs, is used to find the optimal path of candidates to represent the spoken utterance. The chosen segments are then concatenated in the time domain to yield a smooth-sounding speech signal, with natural-sounding prosody. One of the keys to the success of the segment selection component is the context dependent choice of cost functions, and the method of combining the costs from the various features. The RSLab system makes use of a family of complex cost functions that allows linguistic and perceptual knowledge to be incorporated in the segment selection process.
منابع مشابه
Acoustic and perceptual analysis of discontinuities in two TTS concatenation systems
Background Discontinuities It is fair to say that L&H’s (now Scansoft’s) RealSpeak and AT&T’s NextGen are two of the most natural sounding unit selection systems. The transitions between connected units sometimes contain discontinuities, thus creating one of the greatest problems concerning the output in these kinds of systems. The discontinuities are often perceived as ‘jumps’, i.e. a disturba...
متن کاملStrategy-aligned fuzzy approach for market segment evaluation and selection: a modular decision support system by dynamic network process (DNP)
In competitive markets, market segmentation is a critical point of business, and it can be used as a generic strategy. In each segment, strategies lead companies to their targets; thus, segment selection and the application of the appropriate strategies over time are very important to achieve successful business. This paper aims to model a strategy-aligned fuzzy approach to market segment ev...
متن کاملERIC: an agent framework for embodied real-time intelligent commentary
6 Process of development The agent framework was developed between February and May 2007, as part of the author’s Masters thesis. 7 Resources used • Nuance RealSpeak Solo for TTS speech generation, http://www.nuance.com/realspeak/solo • Charamel CharaVirld2 character environment, and CharaSpy debugger, http://www.charamel.de • Jess: the Java Expert System Shell, http://herzberg.ca.sandia.gov/je...
متن کاملA New Model for Best Customer Segment Selection Using Fuzzy TOPSIS Based on Shannon Entropy
In today’s competitive market, for a business firm to win higher profit among its rivals, it is of necessity to evaluate, and rank its potential customer segments to improve its Customer Relationship Management (CRM). This brings the importance of having more efficient decision making methods considering the current fast growing information era. These decisions usually involve several criteria,...
متن کاملHigh-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion
Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically improve the naturalness of synthetic speech compared with the early TTS. However, no general-purpos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000